##  ..cutHeight not given, setting it to 1.09  ===>  99% of the (truncated) height range in dendro.
##  ..done.
##  ..cutHeight not given, setting it to 9.74  ===>  99% of the (truncated) height range in dendro.
##  ..done.

Heatmaps of Correlations

All

E=0

E=1

E=1 - E=0

\(S = |\rho_{E=0} + \rho_{E=1} - 2 \rho|\)

Fisher’s Z Statistic

Simulation Scenario 1

This is similar to the one presented in the protocol. We restrict ourselves to only fitting interaction models. The Eclust method is now an addon method in the sense that it also contains the clusters derived from the correlation matrix without considering the environment. We use the first 2 PC’s when summarizing a cluster, and fit the model that contains both PCs as well as their interactions with E.

We vary the:

  1. correlation parameter of the first block of genes for the exposed subject (rho)
  2. the number of variables (p)
  3. the sample size (n)
  4. the number of active main effect genes (nActive), note that if the main effect is nonzero then it’s interaction is automatically also included as associated with the response
  5. The matrix used to create environment clusters (Ecluster_distance)

We generate original predictors \(X_j, j=1, \ldots, p\) from a multivariate normal distribution. The correlation between predictors has a block diagonal structure such that within a block, the correlation between \(X_j\) and \(X_{j'}\) is \(\rho_E\) for \(j \neq j'\), where \(\rho_E\) depends on exposure status. Let \(Y^* = \beta_E E + \sum_j \beta_j X_j + \alpha_{jE} X_j E\). We generate a continuous response \(Y = Y^* + k \cdot \varepsilon\) where the error term \(\varepsilon\) is generated from a standard normal distribution, and \(k\) is chosen such that the signal-to-noise ratio \(\left(Var(Y^*)/Var(\varepsilon)\right)\) is 1. Currently, only the differentially correlated block contains the active set.

Recall that the method is the general approach (univariate, penalization, clustering then regression, or environment clustering). The model is what was used to get the coefficient estimates (linear model, lasso, shim which is the strong heredity interaction model). For example, you will notice in the MSE plot, avg_shim has two error bars. One corresponds to using average clusters as predictors, and the other corresponds to using average environment clusters as predictors.

  1. rho: the correlation for the exposed subjects for the differentially correlated block
  2. rhoOther: correlation for all other blocks.
  3. betaMean: the active main effect coefficients are generated from a Uniform(3.9,4.1) distribution
  4. betaE: coefficient of binary environment variable
  5. alphaMean: the interaction effect coefficients are generated from a Uniform(1.9,2.1) distribution

Currently we have results for about 50 simulations for each combination of p, rho, N, nActive and Ecluster similarity matrix.

True Positive Rate vs. Number of non-zero fitted coefficients

Fisher Scoring

Difference of Correlations

Test Set MSE Fisher Scoring

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Test Set MSE Difference of Correlations

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Jaccard Index Fisher Scoring

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Jaccard Index Difference of Correlations

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Spearman Correlation Fisher Scoring

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Spearman Difference of Correlations

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Pearson Correlation Fisher Scoring

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400

Pearson Correlation Difference of Correlations

nActive = 10 Sample Size = 100

nActive = 10 Sample Size = 200

nActive = 10 Sample Size = 400

nActive = 50 Sample Size = 100

nActive = 50 Sample Size = 200

nActive = 50 Sample Size = 400

nActive = 100 Sample Size = 100

nActive = 100 Sample Size = 200

nActive = 100 Sample Size = 400